A Two-Stage Approach for Generating Topic Models

نویسندگان

  • Yang Gao
  • Yue Xu
  • Yuefeng Li
  • Bin Liu
چکیده

Topic modeling has been widely utilized in the fields of information retrieval, text mining, text classification etc. Most existing statistical topic modeling methods such as LDA and pLSA generate a term based representation to represent a topic by selecting single words from multinomial word distribution over this topic. There are two main shortcomings: firstly, popular or common words occur very often across different topics that bring ambiguity to understand topics; secondly, single words lack coherent semantic meaning to accurately represent topics. In order to overcome these problems, in this paper, we propose a two-stage model that combines text mining and pattern mining with statistical modeling to generate more discriminative and semantic rich topic representations. Experiments show that the optimized topic representations generated by the proposed methods outperform the typical statistical topic modeling method LDA in terms of accuracy and certainty.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach Generating Robust and Stable Schedules in m-Machine Flow Shop Scheduling Problems: A Case Study

This paper considers a scheduling problem with uncertain processing times and machine breakdowns in industriall/office workplaces and solves it via a novel robust optimization method. In the traditional robust optimization, the solution robustness is maintained only for a specific set of scenarios, which may worsen the situation  for new scenarios. Thus, a two-stage predictive algorithm is prop...

متن کامل

Stage specialization for design and analysis of flotation circuits

This paper presents a new approach for flotation circuit design. Initially, it was proven numerically and analytically that in order to achieve the highest recovery in different circuit configurations, the best equipment must be placed at the beginning stage of the flotation circuits. The size of the entering particles and the types of streams including pulp and froth were considered as the bas...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Presenting a New Model for Bank’s Supply Chain Performance Evaluating with DEA Solution Approach

Data Envelopment Analysis (DEA) is a method for measuring the efficiency of peer decision making units (DMUs) with multiple inputs and outputs. The traditional DEA treats decision making units under evaluation as black boxes and calculates their efficiencies with first inputs and last outputs. This carries the notion of missing some intermediate measures in the process of changing the inputs to...

متن کامل

ANN-DEA Integrated Approach for Sensitivity Analysis in Efficiency Models

Here, we examine the capability of artificial neural networks (ANNs) in sensitivity analysis of the parameters of efficiency analysis model, namely data envelopment analysis (DEA). We are mainly interested to observe the required change of a group of parameters when another group goes under a managerial change, maintaining the score of the efficiency. In other words, this methodology provides a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013